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E™^ ■ Abstract 

, In this paper we consider Poisson loglinear models with linear constraints (LMLC) 

on the expected table counts. Multinomial and product multinomial loglinear models 
can be obtained by considering that some marginal totals (linear constraints on the 
expected table counts) have been prefixed in a Poisson loglinear model. Therefore with 
the theory developed in this paper, multinomial and product multinomial loglinear 
models can be considered as a particular case. To carry out inferences on the parameters 
, in the LMLC an information-theoretic approach is followed from which the classical 

' maximum likelihood estimators and Pearson chi-square statistics for goodness-of fit 

£ — , are obtained. In addition, nested hypotheses are proposed as a general procedure 

£NJ ' for hypothesis testing. Through a simulation study the appropriateness of proposed 

qq , inference tools is illustrated. 

O , 

Keywords: Loglinear Model, Marginal Model, Sampling Scheme, Restricted Estimators, 
Phi-divergence Measures. 

^ ■ 1 Introduction 

We consider a contingency table with k cells n = (m, ...,nfc) T , with n., being the observed 
frequency associated with the i-th cell (i = l,...,k), its distribution is given by a Poisson 
random variable and since all of them are mutually independent the joint distribution of the 
contingency table is totally specified. Through a loglinear model logm(<?) =X6 a pattern 
is established for the mean vector of the contingency table, m(0) = (m\{6), mk{0)) T , 
rrii{6) = E[i%i\, i = 1, where X is a known k xt full rank design matrix such that 
t < k and = 0t) T G K* is the vector of unknown parameters of the loglinear model. 

Let 

C(X) = {logm(0) :logm(0) =X0;6 G M*} 

be the range of loglinear models associated with X. We can observe that C(X) is the 
column space of matrix X . A usual convention for loglinear models is to assume that the 
vector ofl's, Jfc = (l,...,l) T , belongs to C(X), and therefore if a first column J ^ for X is 
considered, the first term 9\ of is referred to the independent term of the model. 
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In order to make statistical inference in the class of loglinear models C(X), Cressie 
and Pardo [8] considered for the first time in loglinear models, minimum (^-divergence 
estimators and (^-divergence test-statistics. Later Martin and Pardo [22] presented a 
unified study for the three different sampling plans (multinomial, product-multinomial and 
Poisson). 

To study some real situations on the basis of loglinear models, it is necessary to consider, 
in addition to loglinear models, some linear constraints. Loglinear models with linear 
constraints (LMLC) (see Definition 12. ip and product-multinomial sampling were considered 
for the first time by Haber and Brown |14j . One purpose of this paper is to consider 
divergence measures in order to make statistical inference (estimation and testing) in the 
class of LMLC but not only with product-multinomial sampling. We shall present a joint 
study for different sampling plans (multinomial, product-multinomial and Poisson). In 
addition, this article highlights the fact that the choice of additional lineal constraints is 
another way for nesting LMLC, in contrast to the traditional manner of nesting only log- 
lineal constraints by reducing the number of columns of the design matrix. From this idea 
arises a new way for comparing LMLC that has not been previously considered in any paper 
and covers the preexistent hypothesis-testing techniques as special case (this point will be 
clarified in Section 14.21) . 

This article is organized as follows. In Section [2] we shall consider some notation as 
well as some preliminary concepts that will be important in the other sections of the paper. 
We pay special attention to the definition of phi-divergence measures between two non- 
negative vectors. Section [3] is devoted to define and study the minimum phi-divergence 
estimator of LMLC. The performed constrained estimation method will allow us to retain 
the advantage of dealing with Poisson loglinear models (specially to become estimation 
theory easier), even we could have, in fact, a multinomial or product-multinomial sampling 
plan. Moreover by an extension of such a method, if a marginal modeling itself is required, 
a compact estimation methodology is provided. As generalization of the constrained 
maximum likelihood estimation method, the constrained minimum ^-divergence estimation 
theory for LMLC is provided. Based also on (^-divergences, in Section H] some test statistics 
for LMLC are proposed, specifically Section 14.11 is devoted to the problem of goodness- 
of-fit in LMLC and in Section 14.21 the problem of nested hypothesis in LMLC is studied. 
For both problems the asymptotic distribution of the (^-divergence test statistics under 
the null hypothesis are obtained. From such (^-divergence test statistics, in the case of the 
goodness-of-fit of LMLC, the classical likelihood ratio and Pearson chi-square test-statistics, 
presented in Haber and Brown [14] for multinomial and product-multinomial sampling 
schemes, are obtained as special case. In Section [5] three hypothesis tests, which share the 
aim for testing essentially a marginal model, are presented. The common framework of 
the LMLC developed in the previous cited sections, will allow us to carry out an easier 
comparison between them. An example of the potential versatility of such models will 
be shown in Remark 12.11 by considering apparently so different models, such as loglinear 
models and marginal models, within the same type of models. Some particular cases of 
loglinear models (symmetry, quasi-symmetry and ordinal quasi-symmetry) on one hand, 
and a marginal model on the other hand (marginal homogeneity model) are compared and 
the exact size and power of their hypothesis testing is analyzed. 
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2 Basic notation and definitions 

By single index notation of n we are able to unify a broad class of contingency tables, and 
by convention the terms of multiway contingency tables can be considered to be located in 
lexicographical order but by assigning a single index. For example in the usual double index 
notation for a a two-way IxJ contingency table n = (ron, n\i, nij, ...,nn,ni2, • njj) T , 
n a fj can be expressed by a one-to-one index transformation i = (a — 1) J + 6, and therefore 
k = I J. In a three-way I x J x K contingency table, n = (n-m, nn2, nujf, ^ui, nij2, 
■■■,nijK,-,n I11 ,n I12 ,...,n I1K ,...,n I j 1 ,n I j2,...,n I j K ) T , n abc can be expressed by a one-to- 
one index transformation i = (a — 1)JK + (6 — 1)K + c, and therefore k = UK. These 
models with single index notation are the so-called coordinate-free models (see Zelterman 
[3D Chapter 5]). 

Product-multinomial sampling plan can be considered from a Poisson sampling plan 
with some additional linear constraints on the expected cell frequencies. Since c independent 
contingency subtables = (n^i, ...,nf l k h ) T are considered in a product-multinomial 
sampling plan, the whole contingency table is n = (nf , ...,n^) T and k is the summation 
of the number of cells in each subtable k^, that is k = Ylh=i ^h- The marginal total in 
each subtable Yli=i n hi = J\ h n h^ h = 1, c is prefixed to be Nj t G N, and hence the mean 
vector m{6) = (mi(6), m c (6)) T with an underlying Poisson sampling plan c linear 
constraints are verified 

J T kh m h (0)=J T kh n h ,h = l,...,c, or ^(g J T k \ m{6) = \®J T kh J n, (1) 

with ®^ =1 Arf = diag{A 1 , A^} representing the direct sum of d matrices. In particular, 
for multinomial sampling by taking c = 1 we have 

J T k m{6) =J T k n. (2) 

In what follows c = 0, i.e. the case where there is no any linear restriction associated 
with the sampling plan, will represent that the Poisson sampling itself is being taken into 
account. 

It is well-known that there are some equivalences between the inferential results 
associated with the parameters for the three sampling plans (see for instance or instance in 
Lang \19\ I20j and Agresti [H Section 14.4]). The main reason why Poisson loglinear model 
is simpler to handle is based on the independence of the components of the sampling data. 

The parameter space is given by 

G = {6 G R* : Xlm{6) =X%n}, (3) 

where Xq = ©/j =1 «/fc h if c > 1 and Xq is a vector of zeros 0^ if c = (i.e., = R*). When 
c > 2 a stronger assumption than J k G C(X) is taken into account, C(Xq) C C(X), and 
therefore if the first c columns Xq for X are considered the /i-th term Oh (h = 1, ..,c) of 
9 is referred to the independent term for the model focussed only on the h-th contingency 
subtable. 

Haber and Brown p3] considered multinomial and product multinomial LMLC but they 
did not consider the problem with Poisson sampling. Definition 12.11 is an extension of the 
definition given by Haber and Brown in which Poisson Loglinear models are included. 

Definition 2.1. In addition of c linear constraints of the sampling scheme, consider 
r < t — c linear constraints, C T m{6) = d* , i.e. C and d* are k x r and r x 1 matrices 
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respectively. Once a loglinear model is established through a design matrix X , a loglinear 
model with linear constraints is a simultaneous modeling of m{6) through a loglinear pattern 
on one hand and a linear pattern on the other hand 

\ogm{0)=X0 and L T m{0) = d, (4) 

being L= (Xq,C), d = {n T Xq, (d*) T ) T , for c > 1, and L=C, d = d* , for c = 0. It is also 
assumed to hold k > t — c — r, and rank(L) = rank(L,e£) = c + r. 

Several examples are shown in Haber and Brown [14] for c > 1 and an application for 
c = and r > 1 is suggested in Gail \13\ Section 5] (for more details see Pardo and Martin 
[26]). 

The parameter space of is given by 

9 = {0G B* : L T m{6) = d}. (5) 

It has been pointed-out that in most practical cases d* = r , actually it holds in all 
examples of Haber and Brown [T3]. Observe that d* = (d\, d*) T has been assumed to 
be constant, in fact if d*j (j G {l,...,r}) is proportional to N = ^i=i m «(^) there exists 
another equivalent constraint where d*- = 0. 

In establishing asymptotic properties, we let N tend to infinity, and in this condition 
it is assumed that the normalized vector m*(0) =m(6)/N remains fixed. For c > 1 this 
implies that, as ./V — > oo, the probabilities in each cell remain fixed and Nh/N, h = 1, c, 
remain also fixed. 

Remark 2.1. It is interesting to observe that we can consider two cases of LMLC: 

i) The classical loglinear models without linear constraints, which are only defined through 
the loglinear pattern logm(0) =X6 and thus r = and L=Xq. 

ii) The marginal models, which are only defined through the linear pattern L T m(0) = d 
and thus by considering that X is given by the identity matrix of order k, If, (i.e., k = t) 
the loglinear pattern is not itself a restriction. 

In the particular case c = and r = 0, i.e., Poisson loglinear models without linear 
constraints, Cressie and Pardo [8] considered the problem of testing using divergence 
measures between probability vectors and solving the problem of estimation using the 
maximum likelihood estimator. Later in Martin and Pardo [22] the problem of estimation 
and testing was considered using divergence measures between nonnegative vectors but only 
for r = 0. Now in this paper the results obtained in Martin and Pardo [22] are extended 
for any r > 0. In this extension we consider the (^-divergence measure between nonnegative 
vectors. 

Let <I> be the class of all convex and differentiable functions (ft : [0, oo) 4 lU {oo}, 
such that at x = 1, 4>(1) = (j)' (1) = 0, </>" (1) > 0. A ^-divergence measure between the 
lw_-vectors a = (oi, ak) T and b = (b\, ...,bk) T is given by 



k 

i=i V 



G $, (6) 



where 0</>(0/0) = and 0<fi(p/0) = p lim. u _ >0O (u) /u conventions are assumed. These 
measures cover the traditional ones for probabilistic arguments, analyzed in Pardo [24] . 
and all of them share similar properties. In particular by taking A G R and 

*(*)(*) = - A ^ A + 1) » ifA(A + l)/0, (7) 
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and (pr\*\(x) = liniA-5.A* <ft(\)(x), if A* G {0,-1}, power divergence measures, introduced in 
Cressie and Read [9] , are obtained. The so-called Kullback divergence measure is obtained 
through ^>(o) (x) = x log x — x + 1, 

k / \ k k 

D Ku u (a,b) = D^ Q) (a,b) = J^a; log ( -i ) - J^a* + (8) 

i=l V ° j/ i=i i=i 

which was given between two non-negative vectors for the first time in Brockett [3J. It should 
be pointed out that the way in which asymptotic results were obtained in [22] is primarily 
focussed on the parameter vector, being the mean vector a secondary aim, and therefore this 
way is just opposite to the one followed for other works related to loglinear modeling (see 
for instance Lang [18]) where the primary aim is the mean vector itself. These measures 
and also the methodology for developing asymptotic results will remain being useful for 
obtaining the asymptotic results associated with LMLC. Taking into account Remark 12.11 
it is important to clarify that apart from the possibility of reproducing all inferential results 
obtained previously in Martin and Pardo [22], the new results of this paper are important 
because the LMLC cover a broad range of models. 

3 Minimum 0-divergence estimator 

The maximum likelihood estimator (MLE) 6 of the parameter in (J3J can be obtained by 
maximizing the kernel of the Poisson log-likelihood 

k k 

£(n,m(0)) = ^ra, log m;(0) - ^77^(0), 

i=l i=l 

subject to the constraints L T m{6) = d, i.e. on the basis of ([5]) 

= argmax£(n,m(0)). 
see 

Observe that according to Definition 12.11 if ^fo = (B^ =1 «/fc h , which takes part in L as 
submatrix, the underlying sampling plan is product-multinomial (or multinomial, if c = 1). 
In what follows even sometimes (product) multinomial sampling will not be explicitly 
mentioned, in all results this sampling plan will be covered. 
On the basis of ([8]) we have 

k k 
D Ku u(n,m(6)) = logn^-^jii -£(n,m(8)), 

i=l i=X 

and it is possible also define the MLE of the parameter in ([4]) by 

= argmin-DK^(ri,m(0)). 

Rather than using MLE, one could use divergence based methods for estimating the 
parameters of the loglinear models with linear constraints. On the basis of ([6]) a minimum 
0-divergence estimator (M0E) for a LMLC, given in Definition 12. 1[ is defined as follows. 
Definition 3.1. For a LMLC with parameter space the Mcf>E is given by 

0^ = argminL>0(n,m(0)), (9) 
0go 

with D^(n,m(0)) defined by (0). 
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In Aitchison and Silvey [3] a method for finding MLE's subject to constraints and 
its asymptotic distribution theory was developed for the first time using the Lagrange 
multiplier method. In Pardo et al. [25] a M^E procedure for multinomial models was 
introduced in which the probabilities depend on unknown parameters that satisfy some 
functional relationships. Following the last method but more generally in the sense that 
the probabilities are replaced by means, in the following theorem we present the key result 
for developing the asymptotic distribution theory for LMLC, the decomposition of the M0E 
for the parameter vector. 

Theorem 3.1. Suppose that the data n = (n\, rik) T are Poisson distributed whose 
mean vector is given by a LMLC Choosing a function <fi S $, where $ was defined in 
SectionUl we have 

t = Q + H(6 )X T - m*(0 o )) + o (||^ - m*(6 ) 



where 



H(o Q ) = iAOor 1 - iAOor 1 B(e )(B(e ) T i T (e r 1 B(e )^ 
x B(e ) T i T (e y\ 

Ijt(Qq)=X t D m *(g )X (Fisher information matrix associated with the 

Poisson loglinear model), 
B(6 )=X T D m *(g )L, 

D ra t( 9o j is the diagonal matrix of the normalized vector m*(0) and Oq&Q is the true and 
unknown value of the parameter. 

Proof. We omit the proof because its steps are similar to ones given in Martin and Pardo 
[21] with the differences motivated because in the cited paper only multinomial sampling 
was considered. □ 

In the next theorem we obtain the asymptotic distribution of as well as of m(0 ). 

Theorem 3.2. Suppose that the data n = (n±, ...,nk) T are Poisson distributed whose 
mean vector is given by a LMLC Choosing a function <f> G <E>, where $ was defined in 
SectionUl we have 
a) 

y/N(gt - 0Q ) A Af(Q t , H(0 )) (10) 

N->oo 

where H(0q) is defined in Theorem \3.1\ " — > " denotes convergence in law (or distribution) 

and 

b) 

'm(8*) - m(0 o )) -A AA(0,, E) (11) 



where 

^=D m *^e )XH(6 )X T D m *^ eo y 
Proof. Result a) follows by Theorem 13.11 and taking into account (see Haberman [16 



J=(n - m(0 o )) ^ AA(O fc , D m . (0o) ). (12) 

Part b) follows by a) and applying delta method (see for instance Agresti [H Sections 14.1.2, 
14.1.3]). □ 



POISSON LOGLINEAR MODELING WITH LINEAR CONSTRAINTS 



7 



In the next theorem a result related to a simplification of the expression of the 
asymptotic variance-covariance matrices of Theorem 13,21 is shown. 
Theorem 3.3. When 

X= (L,W) (13) 



we have 



and 



where 



H{6q) =(X T D m *( eo )X) 1 - (L T D m *( 0a )L) 1 © 0( t _ c _ r ) x ( i _ c „ r ) 



^=DL { e o) (A x (Oo)-A L (Oo))Dl, 



m*(0 o )' 



A x (e )^D^ {eo) X(X T D m , m X)- 1 X T Dl^ 

A L (0o)-^L (0o) ^(i T ^(« o )i)" 1 ^ J Dt (eo) . 

We can observe that Ax(Oq) and Al(Oq) ar e projector matrices respectively on column 
i i 

spaces C(D^ {f)o) X) and C{D^ {9q) L). 

Proof. By starting through a identity matrix, 

It = { xT D m *{e ) x ) xT Dm*(9 )X 
= {X T D mHeo) xy 1 X T D m , (eo) (L,W) 

= ^(X T D m ,( 0Q )X) X T D m »( 0o )L, (X T D m *( 6o yX) X T D m *( eQ ^V^ , 
it is obtained that 

G tx (c+r) = { XTD m*(0 ) X ) XTD m*(e a ) L = ( n. C+? " J ■ 

\ u (t— c— r) X (c+r) / 

Therefore 

H(6 ) = (X 7 'D m *(g )X) - (X T D m »(e )X) X T D m *^ )L 

x ( L T Dm*(9 ) X D m*(6 ) X ) xT D m*(9 ) L 



L T Dm*{O ) X { XT Dm*{O ) X ) 



- (X T D m *^ - ) X) - G tx(c+r ) (L T D m « {0o )L) G^ x(c+r) 

= [ X D m*(0 o ) X ) — \L D m *( 0Q )L) © 0( f _ c _ r)x ( t _ c _ r ). 

On the other hand, the expression of X is obtained replacing the new expression of H(9q) 
inside its original definition in Theorem 13.21 □ 

Remark 3.1. When r = and c = 1 (classical multinomial loglinear model), 
X= (Jk,W), J 7 D m *^ g ^Jk = 1 and thus it holds H(Oq) =(X t D m * — 1 © 

i)x(t— 1) as direct application of Theorem 13.31 If we pay attention to the structure 
of this variance-covariance matrix, we can observe that the 6 = (9\,6 ) T is partitioned 
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in such a way that once the part associated with W, 6 = {62, ■■■,0t) , is known, the first 
component 9\ can be obtained through and the linear constraint. This is the reason why in 
the traditional multinomial loglinear modeling the dimension of the parameter space ist — 1 
instead of t and 6\ = N / (jj^ exp{W 6}) is the redundant component of the multinomial 
loglinear model. When c > and r > 1, it is possible to partitionate and interpret any 
parameter vector in terms of U3\) . Due to space limitation, we omit it in a formal way. 
In a less formal way we can say that making transformation on the design or restrictions 
matrices, it is possible to obtain LMLC with an structure for the design matrix like in Iil3\) . 
The part of the parameters associated with matrix W, are "free parameters", while the 
rest of the terms are determinated through a function. It is frequent to find textbooks that 
consider only free parameters for making statistical inferences. 



4 ^-divergence test statistics 
4.1 Goodness-of-fit 

Classical measures for assessing the goodness-of-fit of categorical data models, estimated by 
MLE, are the likelihood ratio test statistic, sometimes referred to as the deviance statistic, 



i) = 2^ (m log - (n, - m^d))) , (14) 



i=l 



and Pearson chi-square test statistic 

In Haber an Brown |14] the asymptotic distribution of a classical goodness-of-fit test- 
statistics for LMLC when the sampling scheme is (product) multinomial (c > 1, r > 0) was 
established. On the other hand in Martin and Pardo [22] divergence based goodness-of-fit 
test-statistics were analyzed for loglinear models under the three sampling schemes (c > 0) 
when none constraint additional to the sampling ones are considered (r = 0). In this section 
we extend the previous result to the important context in which r > 0. In this framework 
the family of (^-divergence test statistics is given by 



Observe that while the divergence based estimator is associated with a specific (f>2 function, 
the divergence based test-statistic is associated with a function <pi, not necessarily equal 

to 02, in fact while G 2 {6) = T^ ) (0 (0) ) where 0( O )(x) = xlogx - x + 1, it holds 

X 2 {6) = T*W(0 0co) ) where <f> {x) {x) = \{x - l) 2 . 

In the following theorem we establish that the asymptotic distribution of the family 

of (^-divergence test statistics, is a chi-square with k — t + c degrees of freedom 

(xi-t+c)- Therefore, we do not accept the null hypothesis in which the model is said 

to be dU) if T^ >1 {6^ 2 ) > c, where c is specified so that the size of the test is a, 

Pr(T^(0^ 2 ) > c I H Null ) = Pr(xl_ t+C > c) = a, i.e. c = xl-t+d 1 ~ «) is the i 1 ~ a > th 
quantile of a xt-t+c distribution. 

Theorem 4.1. Suppose that the data n = (ni, ...,nk) T are Poisson distributed. Choose 
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the function (pi,4>2 £ where $ was defined in SectionUl Then, for testing 

H Nuli logm(6>) e C(X) and0eO = {6' £ R* : L T m{6') =d}, (17) 
fl^f logm(0) i C(X) or <£ 6 = G R* : L T m(0') =d}, 

the asymptotic null distribution of the test statistic T^ 1 {6^ 2 ), given in I116\) , is chi- squared 
with k — t + c degrees of freedom. 

Proof We consider the function / (x, y) = x(j>\ (y/x). A second order Taylor's expansion 

of f(f ,m*(G* 2 )) about (m*(0oW(0o)) gives 



Taking into account 

VlV ' i=l i=l 



where 

_ rii — mi{9 ) 
V«ii(0o) 

we obtain the following vectorial expression 

T^(e' t,2 ) = z T z+o P {\), 

being 

Z= Z k f = D~J m {n-m{6^)). 

The random vector Z is asymptotically normal distributed with mean vector zero and 
asymptotic variance-covariance matrix 

T* ee I fe -A o (0o)-i>t (0o) Xff(0o)X^ (0o) , (18) 
where Aq(6q) is given by 

Ao(0o)= J DL( 0() )^oW J D m , (eo) Xo)^ 1 X^t {0o) - (19) 

Then, the asymptotic distribution of the (^-divergence test statistic T^{9 ) will be a chi- 
square iff the matrix T* is idempotent and symmetric. It is clear that T* is symmetric, 
and to establish that it is idempotent we have 

(T*) 2 = SS-SK-KS+KK = S-K-K+K = T* , 

i i 

where S=I}~— Aq(Oq) an d K=D^ n ,^ g ^XH{Oo)X T D^^^. The degrees of freedom of 

the chi-squared distributed statistic T^ 1 (6^ >2 ) coincides with the trace of the matrix T* , i.e. 
k-t + c. □ 
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1 1 

Remark 4.1. When C(L) C C(X). because C(D 2 rfo ,L) C C(D 2 , fo ,X) it holds 

j_ i 

■^■x{Oo)D^ n *rg\L = D^rg^L, which means that U8\) is given by 

T* = I k -Ao(e )-A x (e )+A L (0 ). (20) 

From this expression it is concluded that when there is no any sampling constraint (r = => 
Al(0q) =Aq(0q)), the variance- covariance matrix U8\) of the random vector Z, under the 
assumption that the model of the null hypothesis in (T7\ ) holds, have a common expression, 
T* = Ik—Ax(Oo), for the three sampling schemes (c > 0). 

4.2 Nested hypothesis 

Two models are said to be nested if one of them can be obtained from the other one 
as special case. This general definition for linear models (see Chatterjee and Hadi [61 
page 65]) can be applied to two loglinear models, whose design matrices are given by 
X\ and X%, in such a way that the first one is said to be nested within the second 
one if C(X\) C C(X<i). Observe that rank(JTi) = t\ < rank(X2) = t2, and therefore 
Gi = {6' G M* 1 : Xlm{0') =X^n} C 6 2 = {0' G M* 2 : X%m(0') =X%n}, i.e. V0i G 9i 
302 G ©2 : m{6i) = m(#2)- Moreover, if X2 = (Xi,!^), where rank("K2) = «2 (i-e., 
fa = ti + S2), by considering 62 = (0f,0j 2 ) T , it holds m(0i) = -01(62), and thus the 
loglinear model defined by Xi is nested within the loglinear model defined by X2. In order 
to clarify that this is a particular case of nested model, a loglinear model defined by X\ is 
said to be a reduced loglinear model of X2 = (-X"i, ^2)- 

In the following definition we consider a sequence of design matrices { X^^ =l so that 
the loglinear model associated with _X^ +1 = (L,Wb + i) is a reduced loglinear model of 
Xb= (L,Wb), b = 1,...,B — 1, which means that Wb+i is a submatrix of Wb- Such 
matrices define a sequence of LMLC that share the same linear constraints. 

Definition 4.1. The sequence of LMLC 

\ogm(O b ) =X b b and L T m(O b ) = d, (21) 

where Xb = (x±, ...,x t -b+i), b G {l,...,f — c — r}, is called the b-th reduced LMLC 
through the parameter, because by reducing one unit the dimension of the parameter space 
@ b = {0 b G : L T m(6 b ) =d}, it holds M b+1 C M b where 

M b = {m{6 b ) G R k : log m(G b ) = X b e b ,O b G G 6 }. 

In the following definition we consider a sequence of constraints {Lbm(d) = db]f =l 
so that the (Lb,db) is a submatrix of (Lb+i,db+i). Such constraints define a sequence of 
LMLC that share the same design matrix X= (Lb,Wb)- 

Definition 4.2. The sequence of LMLC 

\ogm{0)=X6 and L^m(G) = d b , (22) 

where X = (x ll ...,x t ), L x = (xi, ...,x c+r ) and L b+ \ = (L b ,x c+r+b ), b G {1, ...,t - c-r}, 
is called the b-th reduced LMLC through the constraints, because by increasing one unit the 
number of constraints, since @b+i C @b with the parameter space given by @b = {0 G : 
Ljm(0) =d b }, it holds M b+ \ C M b where 



M b = {m{9) G R k : \ogm{0) = X0,6 G G 6 }. 
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In the following a generalized definition of nested LMLC is given, in which Definitions 
14,11 and 14.21 are covered. 

Definition 4.3. In a sequence of LMLC {M b }f =1 such that 

M b = {m{6 b ) G R k : log m(G b ) = X b G b ,O b G Q b }, 
Q b = {6 b G R tb : Llm{O h ) =d b }, 

t b = rank(X fe ), 
L b = (X , C b ), 

r b = rank(C b ), 

M b+ \ is said to be nested within M b (b G {1, B — 1}), denoted by M b+ i C M b ,if it holds 

C(X b+1 ) C C(X b ) and C(L b ) C C(L b+1 ), (23) 

with t b +i < t b and r b+ \ > r b , being strict at least one of the two inequalities. 

Once a sequence of nested LMLC {M b } b=1 has been established, our goal is to present 
(^-divergence test statistics to test successively 

H NuU {b) : M b+1 against H Alt {b) : M b - M b+1 ; b = l,...,B-l, (24) 

where we continue to test as long as the null hypothesis is accepted and we infer an integer 
b*, such that b G {1, ...,B — 1}, to be the first value b for which M b+ i is rejected as null 
hypothesis, or b* = B otherwise. 

In Agresti jU Section 4.5.4] the classical likelihood ratio test statistic for loglinear models 
(r b = r b+ i = 0, c > 0) is given, 

G 2 (e b+1 \e b ) = 2V [ rm(e b ) log m ^ b \ _ mi (d b ) + m^?^ 
i=i V m(O b+ i) J 

= 2 (p Kull {n, m(6 b+1 )) - D Kull {n, m(0 6 ))) , (25) 

where b +\ and 6 b are the MLE's of the parameter in the models M b+ \ and M b respectively. 
It is also shown that G 2 (0 b+ i\6 b ) is asymptotically distributed according to a chi-square 
with t b — t b+ \ degrees of freedom under the null hypothesis of (|24p . Minimizing the Kullback 
divergence measure over a smaller parameter space cannot yield a larger minimum value, 
therefore D^ u n{n, m(9 b+ i)) > Dx u ii{n,m{O b )). However there is another interesting way 
to show the same inequality, which is based on proving 

D Kull (n,m(d b+1 )) - D Kull (n,m(d b )) = D KuU (m(G h ),rn$ b+1 ) \ , (26) 

whose non- negativity is guaranteed by the common property of all divergence measures. 
Based on (f25j) and ([26]) . in Martin and Pardo [22j Section 4] divergence-based test statistics 
were introduced for the same models (r b = r b+ \ = 0, c > 0), 

S*@t+l\dt) = ^) (^Km^+i)) - D^n,m{et))) (27) 

and 

THoUot) = JLrD* (m(dt 2 ),m(dt 2 +1 )) , (28) 
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whose asymptotic distribution under the null hypothesis of (|24j) was shown to be exactly 
the same as G 2 (6b+i\0b) for both of them. It should be pointed out that (126j) does not hold 
by replacing any ^-divergence measure instead of the Kullback divergence measure. 

In the more general framework of LMLC {r^+i > > 0, c > 0) we shall establish herein 

that under the null hypothesis of (JMJ), the test statistics T^ 1 (^tfil^fe 2 ) and S^(6 b+1 \6b) 
converge in law to a chi-square with %— tb+±— r^+rb+i degrees of freedom (Xt b -t b+1 -r b +r b+1 ) > 
6 = 1, 5—1. Thus, Xtfc-ti, i-rft+r^+i (1 ~~ a ) com d be chosen as a cutpoint for the rejection 
region. 

Theorem 4.2. Suppose that the data n = {n\, ...,nk) are Poisson distributed. Choose 
the function 0i,02 £ where $ was defined in Section m Then, for testing \2J$ , the 

asymptotic null distribution of the test statistic T^ 1 {G^^O^ 2 ) , given in [2B\) , is chi-squared 
with tb — tb+i — rb + rb+i degrees of freedom. 

Proof. A similar Taylor's expansion to one given in Theorem 14.11 yields 

T^(dtUdt 2 ) = ZlZ b+ o P (l), 

where 

Zb = D-J {eo) (rn(dt 2 )-rn(dt 2 +l )) 

is distributed asymptotically as a normal distribution with mean vector zero and variance- 
covariance matrix T* b = Kb — Kb+\ with 

K^A Xj (^ +li0 )-A Xj (^ +li0 )^t(« b+1 , )^(^ T£> t(^ +1 ,o) < 29 ) 
x A Xj (9 b+l , )Di* {eb+io) L 3 >J LjDl t{ebo) A Xj (O b+1 , ), 

A Xj {6 b+lfi )^Dl, {eb+ifi) X 3 {x^ (30) 

for j = b, b+ 1. The asymptotic distribution of the test statistic T^ 1 (0 b+1 \0 b ) will be a chi- 
square if the matrix T b is idempotent and symmetric. It is clear that T b is symmetric, we 

1/2 1/2 

shall establish that it is also idempotent. Since C(D Jui) C C(D .Ij) 

we have 

Ax b+1 (,Qb+i,o)=A Xb+1 (Ob+i,o)A Xb+1 (0&+i,o) =A Xb+1 (6 b+ ifi)A Xb (6 b+ ifi), 

A Xb (Ob+i,o) =A Xb {6 b+ ifi)A Xb (db+ifi), 

and on the other hand since C(Lb) C C(Lb+i) there exists a matrix B such that 
.Lb = Lb+\B. Thus it holds 

i) Kb+\ = Kb+\Kb+i = Kb+iKb = KbKb+i, 

ii) K b = K b K b , 

which implies T* b T* b = T* b . 

The degrees of freedom of the chi-squared distributed statistic T^ 1 {0 b+l \6 b j coincides with 
the trace of the matrix T* b , i.e. tb — tb+i — rb + H+i- □ 

For the test statistic S^(6 b+ i\6 b ) the same result as Theorem 14.21 can be obtained by 
following a similar proof. 

Remark 4.2. Consider the saturated LMLC, i.e. the design matrix of the loglinear 
model is given by a k x k matrix X\ (t\ = k), and we may assume, without any loss 
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of generality, X\=Ik- On the other hand, apart from the constraints associated with the 
sampling scheme (c>0) there is no any additional linear constraint (t\ = 0) 

Mi = {m(0i) G R k : logm(0i) = 0i,0i G 6i}, 

S 1 = {6 1 G R k : Xlm(Oi) =Xln). 

Consider also a generic LMLC 

M 2 = {m(G 2 ) G R k : log m(0 2 ) = X 2 2 ,0 2 G 6 2 }, 

6 2 = {0 2 G R t2 : Llm{6 2 ) =d 2 }, 
where t 2 < k, r 2 > 0, being strict at least one of the two inequalities. The hypothesis testing 



\21$ for the two nested LMLC above (B = 2) is the same as the goodness- of -fit test (11) 

associated with the model M 2 . Therefore T^ 1 (Q^ = T^ 1 (9 2 ' ') ■ 

To test the sequence of LMLC ([21]) b = 1, 6*, we need an asymptotic independence 

result for the sequence of test statistics {T^^li^t )}&Li (or {^(0t +1 |0fe This 
result is given in the theorem below. 

Theorem 4.3. Suppose that data n = (ni, rifc) T are Poisson distributed. We first 
test, H Nua : M b+1 against H AH : M b , followed by H NuU : M b against H AH : M 6 _i. Then, 

under the assumption that it holds M b +i, the statistics T^ 1 (0 b ^ 1 |0 fe 2 ) and T^ 1 (6 b 2 \6 b 2 _ 1 ) 
are asymptotically independent. 

Proof. A second order Taylor's expansion gives 

T^(0- 2 |0-!i) = Z*Z j+ o P (l), j G {6+1,6}, 

where 

and T* = Kj - K j+1 with Kj, j = b, b - 1 defined in ([29]). By Searle (30] Theorem 4 in 
page 59] the quadratic forms 

for j = b + 1, 6, are asymptotically independent if 

_i _i _i _i 

O m*(0 6+1 , o ) T fe+l^m*(0 b+1 , o ) S ^m*(0 6+liO ) T ^m*(0 ()+liO ) = °kxk, 

where matrix XI = D m ,^Q b+1 ) is the asymptotic variance-covariance matrix of vector 

(n— m(Q b+ ifi))/^fN. By following a similar argument given in the proof of Theorem 14.21 

to see that T* b is idempotent, it follows that T* b+l Tl = Okxk- C 

For (]27p the same result as Theorem 14.21 can be obtained by following a similar proof. 

According to Agresti [1] page 215] once an asymptotic probability of error I equals 
i 

1 — (1 — a) B ~ 1 has been established for each test in a sequence of nested tests, the overall 
asymptotic probability of type I error is less or equal than a. In the next theorem a stronger 
result is given. 

Theorem 4.4. For a sequence of B — 1 tests \2J$ associated with a sequence of LMLC 

R 1 

{M b } b=l , when each test has a size equals 1 — (1 — a) 3 - 1 , the overall size of the tests is 
given by a. 
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Proof. For the purpose of establishing a size equals 1 — (1 — a) 3 - 1 for each hypothesis 

testing in (|21j) we shall consider according to Theorem l4.2l a cutpoint for the rejection region 

i 

equals Xdfiv- ~ a ) B ~ 1 ), where df = % — t^+i — r& + r^+i- Thus, the overall size for testing 
(f24"j) in a sequence of nested LMLC {Mj ) }^ =1 is given by 

Pr(36 e {1, B - 1} : T^S-il^ ) > Xd/tt 1 ~ »)^)\H Null (b)) 
= 1 - Pr(T^(di 2 +1 \dt 2 ) < X 2 df ((l - a)^)\H Null (b), b = 1, B - 1) 
= 1 " nf^Pr^^+il^ 2 ) < Xl/((1 " a)^r)|^(6)) 

= i-nf=i 1 a-«) i ^ T = «- 

The second equality comes from Theorem 14.31 and the third one from Theorem 14.21 □ 

5 Simulation study: Marginal homogeneity 

5.1 Description of conditional and unconditional tests 

The traditionally so-called conditional test for marginal homogeneity (MH) was applied for 
the first time in Caussinus [5]. He noted that once it is known that the quasi-symmetry 
(QS) model holds, marginal homogeneity (MH) is equivalent to symmetry (S). In other 
words, because QS is a nested model within S (Mqs C Ms), first we could test whether 
it holds QS model against the alternative hypothesis of saturated model (SAT), defined in 
Remark 0~2] (M QS C M SA t), 

H NuU (l) : Mqs against H Alt (l) : M SA T ~ M QS , (31) 

and after that 

H NuU {2) : M s against H AU (2) : M s - Mqs. (32) 

Focussed on a square I x I contingency table with multinomial sampling (c = 1), one could 
be interested in analyzing what the difference is between testing the conditional model 
above and the unconditional model of MH below 

H N uii ■ M MH against H Alt : M SA r - M MH . (33) 

The formulation of these models for two-way contingency tables is 

*Yjnij(9 M H) = ^mijiOMu) or m im (G MH ) = m.i(0 Mn ), i,j = 1,...,J; 

j=l i=l 

\ogmij(6 s ) = u + 9i + 6j + 9ij, i,j = 1, 
with 6ij = 9ji, Vi ^ j, Yii=A = °, Ei=i0i2(ij) =0,3 = h -J\ 

\ogm i3 (0 O Qs) = u + 6 1{i) +6 1( j) + /3wj + 9 12 (ij), i,j = I, -J, 

with 6i2(ij) = 0\2(ji), / j, T,i=i°Hi) = °, ELi^fo) = 0, j = I,..., I, T,j=i w j = °> 
Ylj=i w j = 1' where {wj}j =l is a set of weights associated to each category j £ {1, ...,/} 
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such that the distance between the contiguous ones is fixed, i.e. 



(for more details about the interpretation of this model see Agresti and Kateri |17|); 

log m,ij(OQs) = u + 9 1( i) + 02(j) + °i2(ij), ij = 

with 12 (#) = 0i 2 (ji), Vi / j, Ei=i i(i) = Ej=i#2(j) = 0, Ei=i^i2(ij) = 0, j = 1, 

While M5 and Mgs are loglinear models, Maw is a marginal model (see Remark 
I2.1|) . By taking into account the meaning of both tests, the initial conditions are different, 
actually while a rejection of H^uii implies that MH is not accepted, a rejection of Hn u u(1) 
does not implies the same fact, i.e. even though QS is rejected a MH could hold. 
However, an acceptance of H^uii and H^ u u(2) implies the same hypothesis, i.e. MH is 
accepted. Because all models, Ms AT-, Mqs, Ms, and Mmu-> are LMLC, it is possible to 
establish the same true model for both tests ([3~Tj) - (f32j) and (|33|) . by choosing a suitable 
parametrization according to the design matrices. On the other hand, it will be possible to 
carry out such a test in order to compare the exact size and power, through a simulation 
experiment by using (^-divergence based test-statistics. For this purpose we shall focus on 
M^ (A2) E's with A 2 G {-0.5,0,2/3,1,2} (i.e., with A 2 = MLE's are included), as well 
as on the same family of 0Q 1 )-divergence measures ([7]) for building test-statistics, with 
Ai G {—0.5, 0, 2/3, 1, 2} (i.e., once MLE's are included, with Ai = and Ai = 1 the classical 
test-statistics are obtained, likelihood ratio G 2 (9b+i\9b) and chi-squared X 2 (9b+±\9b) test- 
statistics respectively) 

1 1 m Al + 1 f/ (A2) ) \ 

E E ' j^ -nUiCAi + l^O 



T^(eti\ 2) \dt (X2) : 



/2^ A 2)M_ m v(K-L 2) ) X _ n 

2 ££ miji °^ )log ^r^' Ai -° 

2EEm,(^)log^4 ? S 1 1 A 1 = -l 

where n and I are respectively the total table count and the table size (i.e., k = I 2 and 
n = Ei=iEj=i n ij)- I n particular, if the saturated LMLC is considered as M&, then 



m ij(@b 2 ) = n ij ( see Remark[ 

We shall also consider another conditional test for MH, which is based on the ordinal 
quasi-symmetry model (OQS), instead of the previously considered QS, 

H Null (l') : M OQS against H Mt (l') : M SAT - M OQS , (34) 

and 

H Null {2') : M s against H Mt {2') : M s - M OQS (35) 

(Mg C Mqqs C Ms at)- Although OQS is usually applied for ordinal categorical data, 
because its interpretation, it is possible to consider OQS as LMLC, in a generic way, by 
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defining its design matrix (for more information about this model see Agresti [2j Section 
8.4]). In order to compute the powers for the two conditional tests, apart from considering 
a common true model, because Ms C Mqqs C Mqs, we shall consider the same points of 
the alternative hypotheses. 

Table 1 

Theoretical probabilities for a I x I table (I = A) 



m* ? .(0 o ) 


1 


2 


3 


4 


<(*o) 


1 


0.08161 


0.03156 


0.01647 


0.01050 


0.14017 


2 


0.03156 


0.21104 


0.05204 


0.01418 


0.30883 


3 


0.01647 


0.05204 


0.22186 


0.03156 


0.32195 


4 


0.01050 


0.01418 


0.03156 


0.17278 


0.22905 




0.14017 


0.30883 


0.32195 


0.22905 


<M = i 



In Table[T]the theoretical probability vector belonging to a multinomial sampling scheme 
with n £ {100, 250, 400, 550} is shown. Its corresponding values for the parameters for each 
model (null hypotheses) are also given (tj^H — 16, is = 10, togs = 11 and tgs = 13): 

QMH = (uMTi j 01(1) , ^1(2) , 01(3) > 02(1) , 02(2) , 02(3) > 012(11) , 012(12) , 012(13) > 
012(21) i 012(22) j 012(23) j 012(31) ; 012(32) j 012(33) ) T 

= (u M H, -0-95, -1.6, -2.05, -0.95, 0.95, -0.45, -1.75, -1.6, -0.45, 
1.0, -0.95, -2.05, -1.75, -0.95, 0.75) T , 

#S = ("US, 01, 02, 03) 011 5 022, 012, 013j 024, 034) T 

= (u s , -0.35, 0.25, 0.3, 1.5, 1.25, -0.05, -0.75, -1, -0.25) T , 

Ooqs = (o s ,o) T ,e QS = (0 5 , 0,0, Of 

(the values of umu an d u s, are obtained through the different sampling sizes). 
5.2 Simulated Exact sizes and powers 

Focussing first on the tests (I34p - (|35| ). the simulation study is based on repeating the random 
experiments described above R = 10, 000 times to compute on one hand the "exact sizes" 
by simulation 

#{T*C*1) (?^|) )>x 2 (/+1)(/ _ 2) Hooqs)} 



o£ iM \0 O qs) = = 



a n \ U S) — ^ 



CA1.A-) = i_(i_ cg^(0 O Qs))(l - at M (0s)), 



once a simulated "nominal" size of 1 — (1 — a) 2 , with a = 0.05, has been chosen for each 
test. In order to calculate simulated exact powers 12 points are chosen (6 for (|34p and 6 
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for (135 



6sAT(i) = {US AT, #1(1) j 01(2) > 01(3) > 02(1) ) 02(2) , 02(3) +^1 012(11) ) 012(12) , 
012(13) +^2(i), 012(21) j 012(22) , 012(23) ) 012(31) > 012(32) j 012(33)) T 

= (usAT, -0.95, -1.6, -2.05, -0.95, 0.95, -0.45+5i(*), -1.75, -1.6, 

- 0.45+5 2 (i), 1.0, -0.95, -2.05, -1.75, -0.95, 0.75) T , (36) 

with ((^(1), S 2 (l)), (5 l (6), 5 2 (6))) T = ((0.45, 0), (0.7, 0), (0.9, 0), (0, 0.45), (0, 0.7), (0, 0.9)) T . 

6oQs{i) = («OQ5i 01) 02> 03) 011> 022i 012, 013) 024, 034, /3(«)) T 

= {u OQS , -0.35, 0.25, 0.3, 1.5, 1.25, -0.05, -0.75, -1, -0.25, /3(i) ) T , (37) 

with (/3(7), /3(12)) T = (0.5, 0.7, 1.0, -0.5, -0.7, -1.0) T (the values of u S AT and u qs are 
obtained through the different sampling sizes). Thus the simulated exact powers are given 
by 

#{T (^)(0o C sl ) |e5 C ^ ) )>x 2 „ +1) „_ 2) ((l-«)2)|0 s >ir»)} 

p£ lM) (e SAT (i)) = == , 

i = 1, ...,6, 

i = 7, 12. Focussing on the tests (f31~j) - (j32"1) . simulated exact sizes are given by 



R 



a ^M)(e s ) = * {T " (Xl) ( ^ (A2) \ d os 2) >x}-i (d-°) ! Ms)} | 
^.A,) = ! _ (1 _ a (Ai>A 2 ) (0Q5))(1 _ 



and to calculate simulated exact powers 12 points are chosen (6 for (f3Tj) and 6 for (f32j) ). 
the same points (|36l) and (|37|) are valid taking into account that Oqs(i) = (#oqs(*)> 0, 0) T , 

#{T (*i) (0^) |g^2) )>x s ' ((l-a)*)|0 5 ^ r (O)} 

^(flwrW) = ^= , 

i = 1, ...,6, 

i = 7, 12. For the goodness of fit test (j3Tj) we have, as usual, 

a ( x ^(0 MH ) = # ^ (Al) ^ ) )>xj- 1 ((i-^)^)l^«)} 
and to calculate simulated exact powers the same 12 points above are chosen 

= ^ (Al) ^ a) »^- 1 « 1 - a )*)iwo)} > 
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Table 2 

with n e {100, 250} 





n = 100 


n = 250 




A, 
A l 




([Til (fT9l 


IIO^HI-IIOUII 


IjOOl) 


IIQl|l-l|OZ|l 










u.iyio 


D 90^7 

U.iUO / 




n 1 9^9 


U. 1UOU 




n 

u 




U.U / Ul 


u.uuou 


u.u^oo 




n 0^09 

U.UOc/Z 


n 
u 


9 /3 




n 04^ 




n D44^ 


fl 04 "31 


n 04^0 




1 


0.0479 


0.0441 


0.0401 


0.0432 


0.0391 


0.0422 




2 


0.0500 


0.0707 


0.0512 


0.0441 


0.0431 


0.0434 




-0.5 


0.1407 


0.2595 


0.2600 


0.0642 


0.1374 


0.1292 







0.0808 


0.0807 


0.0705 


0.0523 


0.0670 


0.0603 


2/3 


2/3 


0.0376 


0.0257 


0.0244 


0.0411 


0.0372 


0.0350 




1 


0.0254 


0.0188 


0.0194 


0.0369 


0.0308 


0.0295 




2 


0.0118 


0.0173 


0.0159 


0.0280 


0.0214 


0.0243 




-0.5 


0.1725 


0.2872 


0.2832 


0.0743 


0.1386 


0.1500 







0.0923 


0.0883 


0.0780 


0.0578 


0.0658 


0.0726 


1 


2/3 


0.0378 


0.0257 


0.0298 


0.0422 


0.0347 


0.0371 




1 


0.0237 


0.0197 


0.0258 


0.0364 


0.0288 


0.0290 




2 


0.0084 


0.0181 


0.0295 


0.0239 


0.0218 


0.0179 



Table 3 

with n e {400, 550} 





n = 400 


n = 550 


A 2 


Ai 






(IMD-iSD 


(EO) 




JMD-IH) 




-0.5 


0.0587 


0.0800 


0.0704 


0.0554 


0.0664 


0.0664 







0.0556 


0.0589 


0.0553 


0.0530 


0.0566 


0.0566 





2/3 


0.0526 


0.0479 


0.0482 


0.0515 


0.0495 


0.0495 




1 


0.0518 


0.0454 


0.0461 


0.0510 


0.0483 


0.0483 




2 


0.0520 


0.0460 


0.0464 


0.0509 


0.0483 


0.0483 




-0.5 


0.0585 


0.0877 


0.0800 


0.0578 


0.0703 


0.0703 







0.0503 


0.0609 


0.0555 


0.0541 


0.0585 


0.0585 


2/3 


2/3 


0.0441 


0.0440 


0.0413 


0.0494 


0.0463 


0.0463 




1 


0.0415 


0.0384 


0.0372 


0.0480 


0.0418 


0.0418 




2 


0.0375 


0.0299 


0.0309 


0.0433 


0.0353 


0.0353 




-0.5 


0.0628 


0.0887 


0.0960 


0.0609 


0.0738 


0.0738 







0.0541 


0.0581 


0.0653 


0.0559 


0.0584 


0.0584 


1 


2/3 


0.0444 


0.0413 


0.0439 


0.0500 


0.0457 


0.0457 




1 


0.0411 


0.0367 


0.0371 


0.0479 


0.0415 


0.0415 




2 


0.0344 


0.0290 


0.0256 


0.0410 


0.0335 


0.0335 



POISSON LOGLINEAR MODELING WITH LINEAR CONSTRAINTS 



19 



i = 1, ...,6, 

o(\iM)fa #{^ (Al) (g^ 2) ))>x|- 1 ((i-»)^)l^g5W)} 

Pn '[P0QS\})) = r , 

i = 7, 12. The results of simulated exact sizes are shown in Tables [2] and [3j To illustrate 
some representative values of the powers, in Table U] the simulated exact powers focussed 
on the tests (|34|) - (j35| ) are shown for the considered 12 points. The so-called size corrected 
average gradient, defined as 

(i ( f / ^• A 'V4 Ai - Aa) V,^ (&i^h3^t± 2) N 2 

_ V V =1 V J »=7 V 

In „(Ai,A2) 

is an overall measure of performance of the simulated exact size as well as the simulated 
exact powers (this measure was introduced for the first time in Rivas et al. [29]). Such 
a measure is interpreted a normalized mean rate of power gain with respect to the null 
hypothesis along the considered alternatives and it is therefore useful as criterion to select 
a test statistic (Ai) as well as its estimator (A2) with the maximum value of 7^ 1 . The 
values of jrf 1 ' for the same kind of test-statistics considered in Tables [2] and [3] are shown 
in Tables [5] and El 

5.3 Conclusions 

A clear conclusion from the results of Tables [5] and [6] is the best performance of the sequence 
of tests (l34j) - (l35j) compared with (|3"TT) - (j3"2~1) . essentially because the tests (i34")) - (j3"5"j) were much 
more powerful than (I31l) -(|32 p . One possible reason, as it is explained in Agresti [U page 
373], could be that the power of a chi-square test tends to increase when degrees of freedom 
decrease. This explanation is valid for (I35p . however even a greater value of degrees of 
freedom of (|34p . the computed powers of the test (|34p in this experiment were also greater 
than the powers of the test (I3ip . In spite of that, we think that the last result could be 
affected by the common choice of (|36p. which means that the points within the region 
Mqs — Mqqs ar e excluded in the alternative hypothesis of ([34"]) because these points fall 
within the points which are included in the null hypothesis of (|3ip . On the other hand if 
one desires to compare the test-statistics associated with (l34"]) - (l35"]) with respect to ones of 
([33]) . even perhaps better performance of ([34"]) - ([35]) there is no a big difference, and thus 
unless there is an evidence for thinking that there exists OQS before carrying out a MH 
test, it would be more convenient to use the unconditional HM test (1331). Within the values 



of 7 i Al ' A2) for the same test, either (|34p - (|35p or (|33p . the variability is greater for ((3 



(I35p because the simulated exact sizes are also more variable. Apart from the criterion of 
the corrected average gradient, a criterion for excluding from the study the simulated exact 
sizes which are not close or fairly close to the nominal size (a = 0.05) is necessary. Through 
the criterion given by Dale [11], the inequality 



logit(l - a { n XlM) ) - logit(l - a] 



with logit(p) = log(p/(l —p)), is considered, so that the two probabilities, a n Xl ' X2 ^ and a, 
are considered to be close if they satisfy such a inequality with e = 0.35 and fairly close 
if they satisfy with e = 0.7. Note that for a = 0.05, e = 0.35 corresponds to a n Xl ' X ^ £ 
[0.0357,0.0695] and e = 0.7 corresponds to a {XlM) £ [0.0254,0.0357) U (0.0695,0.0954]. 
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Table 4 







(i) with A2 


= | for tests {ggp-fggp 




n \\ 


i = 1 


i = 4 


% = 2 


i = 5 


% = 3 


i = 6 





0.0720 


0.0749 


0.1139 


0.1201 


0.1749 


0.1745 


100 2/3 


0.0188 


0.0163 


0.0382 


0.0389 


0.0736 


0.0745 


1 


0.0110 


0.0088 


0.0251 


0.0252 


0.0522 


0.0502 


2 


0.0038 


0.0035 


0.0095 


0.0111 


0.0271 


0.0257 





0.0996 


0.0989 


0.2123 


0.2091 


0.3669 


0.3637 



250 2/3 0.0582 0.0571 0.1522 0.1516 0.2956 0.2945 

1 0.0483 0.0468 0.1320 0.1330 0.2737 0.2716 

2 0.0337 0.0323 0.1063 0.1047 0.2303 0.2329 



0.1251 0.1161 0.3162 0.3230 0.5575 0.5631 



400 2/3 0.0973 0.0882 0.2760 0.2837 0.5226 0.5249 

1 0.0859 0.0797 0.2616 0.2692 0.5072 0.5093 

2 0.0698 0.0639 0.2334 0.2427 0.4786 0.4840 








0.1536 


0.1453 


0.4363 


0.4410 


0.7292 


0.7347 


550 


2/3 


0.1337 


0.1202 


0.4072 


0.4102 


0.7072 


0.7178 




1 


0.1258 


0.1096 


0.3942 


0.3984 


0.6995 


0.7074 




2 


0.1095 


0.0941 


0.3734 


0.3757 


0.6832 


0.6911 


n 


Ai 


i = 7 


i = 10 


i = 8 


i = 11 


i = 9 


i = 12 







0.0678 


0.0724 


0.1197 


0.1370 


0.2419 


0.2880 


100 


2/3 


0.0572 


0.0634 


0.1050 


0.1206 


0.2183 


0.2646 




1 


0.0527 


0.0597 


0.0993 


0.1148 


0.2088 


0.2562 




2 


0.0511 


0.0550 


0.0931 


0.1061 


0.1949 


0.2452 







0.1798 


0.1980 


0.3597 


0.3901 


0.6588 


0.7164 


250 


2/3 


0.1704 


0.1861 


0.3477 


0.3739 


0.6432 


0.7010 




1 


0.1661 


0.1812 


0.3418 


0.3677 


0.6363 


0.6955 




2 


0.1592 


0.1750 


0.3311 


0.3557 


0.6252 


0.6859 







0.3112 


0.3369 


0.5785 


0.6124 


0.8739 


0.9190 


400 


2/3 


0.3037 


0.3258 


0.5683 


0.6041 


0.8685 


0.9153 




1 


0.3000 


0.3229 


0.5636 


0.6009 


0.8664 


0.9137 




2 


0.2917 


0.3156 


0.5544 


0.5935 


0.8610 


0.9098 







0.4249 


0.4578 


0.7272 


0.7734 


0.9630 


0.9806 


550 


2/3 


0.4177 


0.4510 


0.7215 


0.7671 


0.9607 


0.9798 




1 


0.4147 


0.4480 


0.7191 


0.7651 


0.9598 


0.9797 




2 


0.4077 


0.4431 


0.7117 


0.7593 


0.9581 


0.9789 



POISSON LOGLINEAR MODELING WITH LINEAR CONSTRAINTS 



Table 5 

with n g {100,250} 





n = 100 


n = 250 


A 2 


Ai 


HMD 










dSD-dSSD 







1.9498 


0.8109 


1.8522 


8.8777 


2.2680 


6.5096 





2/3 


2.3191 


0.8173 


3.3489 


9.5346 


3.4695 


8.5937 




1 


2.3989 


0.8030 


3.3222 


9.7044 


3.8527 


9.2039 




2 


2.3323 


0.7436 


2.5181 


9.4949 


3.5113 


9.0488 







2.2460 


0.2623 


1.4553 


8.4591 


2.0100 


6.5121 


2/3 


2/3 


3.3642 


1.2907 


4.5181 


10.0525 


3.4108 


11.1638 




1 


4.1761 


1.8014 


5.5197 


10.8193 


3.9618 


13.1137 




2 


6.1807 


1.6047 


6.4614 


13.1578 


5.3177 


15.5109 







2.0916 


0.8894 


1.2676 


7.8486 


1.7541 


5.5833 


1 


2/3 


3.3562 


0.8595 


3.3604 


9.8390 


3.3627 


10.5545 




1 


4.2558 


0.9214 


3.7848 


10.8838 


4.1772 


12.5275 




2 


7.1583 


1.0689 


3.0949 


14.5192 


6.2040 


15.9077 



Table 6 

with n G {400, 550} 





n = 400 


n = 550 


A 2 


Ai 


HMD 






d33D 











11.5475 


5.1013 


10.5516 


15.1129 


7.5141 


13.4170 





2/3 


12.0922 


6.3045 


12.1017 


15.4706 


8.6282 


14.9108 




1 


12.2447 


6.6720 


12.6871 


15.5996 


8.8334 


15.0571 




2 


12.1903 


6.6276 


12.6977 


15.6361 


8.8885 


14.9610 







13.0615 


4.8961 


10.9942 


14.8684 


7.3124 


13.7563 


2/3 


2/3 


14.4991 


6.3327 


14.7299 


16.0838 


8.8237 


17.1009 




1 


15.2220 


7.1010 


16.2884 


16.4405 


9.5901 


18.3816 




2 


16.3217 


8.7403 


19.3697 


18.0044 


10.9238 


21.2854 







12.2295 


4.3335 


10.1186 


14.4705 


6.8205 


12.7432 


1 


2/3 


14.4409 


6.3769 


14.1149 


15.8989 


8.8589 


16.2596 




1 


15.3240 


7.4787 


15.7674 


16.4489 


10.0339 


17.8500 




2 


17.4356 


10.5383 


19.5188 


18.8438 


12.4853 


21.8208 
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Those simulated exact sizes which are taken as close according to the Dale's criterion have 
been marked in blue color in Tables Q] and [2j and in red color fairly close simulated exact 
sizes. Finally, it is concluded that the best overall choice for the test statistics T"^ A i> (#^ (A2) ) 
is (Ai, A2) £ {(1, 1), (1, §)}, however for the smallest sample size (n = 100) the test statistic 
associated with Ai = 1 is not a good choice and it is better (Ai, A2) £ {(§, 1), (§, §)}• 

To finalize, we would like to comment that LMLC can have been dealt in this paper 
in a more general setting by following generalized log-linear models, Clog (Am(0)) =X6 
(see Lang [18] and references therein). With these models it would be possible to consider 
loglinear constraints for the marginal distributions by considering C = If. and A ^ Ik. 
Furthermore, using minimum power divergence estimators a different type of application 
for log (Am(6)) =X6, with A 7^ Ik and Poisson sampling, can be found in Martin and Li 



References 



[1] Agresti, A. (2002). Categorical Data Analysis (Second Edition). Wiley, New York. 
[2] Agresti, A. (2007). An Introduction to Categorical Data Analysis (Second Edition). Wiley, 
New York. 

[3] Aitchison, J. and Silvey, S. D. (1958). Maximum likelihood estimation of parameters 
subject to constraints. Annals of Mathematical Statistics. 29, 813-828. 

[4] Brockett, P. L. (1991). Information theoretic approach to actuarial science: a unification 
an extension of relevant theory an applications. Transactions of the Society of Actuaries, 43, 
73-114. 

[5] Caussinus, H. (1966): Contribution a I'analyse statistique des tableaux de correlation. Annales 

de la Faculte des Sciences de Toulouse, 29, 77-182. 
[6] Chatterjee, S. and Hadi, A. S. (2006). Regression Analysis by Example (Fourth Edition). 

John Wiley & Sons. 

[7] Cressie, N. and Pardo, L. (2000). Minimum (^-divergence estimator and hierarchical testing 

in loglinear models. Statistica Sinica, 10, 867-884. 
[8] Cressie, N. and Pardo, L. (2002). Model checking in loglinear models using 4>- divergences 

and MLEs. Journal of Statistical Planning and Inference, 103, 437-453. 
[9] Cressie, N. and Read, T. R. C. (1984). Multinomial goodness-of-fit tests. Journal of the 

Royal Statistical Society, Series B, 46, 440-464. 
[10] Christensen, R. (1997). Log-Linear Model and Logistic Regression (Second Edition). 

Springer- Verlag, New York. 
[11] Dale, J. R. (1986). Asymptotic normality of goodness-of-fit statistics for sparse product 

multinomials. Journal of the Royal Statistical Society Series B, 41, 48-59. 
[12] Ferguson, T. S. (1996). A Course in Large Sample Theory. Chapman & Hall, London. 
[13] Gail, M. (1978). The Analysis of Heterogeneity for Indirect Standardized Mortality Ratios. 

Journal of the Royal Statistical Society, Series A, 141, 224-234. 
[14] Haber, M. and Brown, M. B. (1986): Maximum likelihood methods for log-linear models 

when expected frequencies are subject to linear constraints. Journal of the American Statistical 

Association, 81, 477-482. 
[15] Haberman, S. J. (1973). Log-Linear Models For Frequency Data: Sufficient Statistics And 

Likelihood Equations. The Annals of Statistics, 1,617-632. 
[16] Haberman, S. J. (1974). The Analysis of Frequency Data. University of Chicago Press, 

Chicago. 

[17] Kateri M. and Agresti, A. (2007). A class of ordinal quasi-symmetry models for square 

contingency tables. Statistics & Probability Letters, 77, 598-603. 
[18] Lang, J. B. (1996a): Maximum likelihood methods for a generalized class of log-linear models. 

The Annals of Statistics, 24, 726-752. 



POISSON LOGLINEAR MODELING WITH LINEAR CONSTRAINTS 



23 



[19] Lang, J. B. (1996b). On the comparison of multinomial and Poisson log-linear models, Journal 

of the Royal Statistical Society, Series B, 58, 253-266. 
[20] Lang, J. B. (2004). Multinomial-Poisson homogeneous models for contingency tables. The 

Annals of Statistics, 32, 340-383. 
[21] Martin, N. and Pardo, L. (2008). Minimum Phi-divergence Estimators for Loglinear Models 

with Linear Constraints and Multinomial Sampling. Statistical Papers, 49, 15-36. 
[22] Martin, N. AND Pardo, L. (2008). New families of estimators and test statistics in loglinear 

models. Journal of Multivariate Analysis, 99, 1590-1609. 
[23] Martin, N. and Li, Y. (2009). A new class of minimum power divergence estimators with 

applications to cancer surveillance. Harvard University Biostatistics Working Paper Series, 

109. |http://www.bepress.com /harvardbiostat/papciT09 
[24] Pardo, L. (2006). Statistical Inference Based on Divergence Measures. Chapman & Hall/CRC, 

Boca de Raton. 

[25] Pardo, J. A., Pardo, L. and ZOGRAFOS, K. (2002). Minimum (/(-divergence estimators 
with constraints in multinomial populations. Journal of Statistical Planning and Inference, 
104, 221-237. 

[26] Pardo, L. and Martin, N. (2009). Homogeneity/Heterogeneity Hypotheses for Standardized 
Mortality Ratios Based on Minimum Power-divergence Estimators. Biometrical Journal, 51, 
819 - 836. 

[27] Rao, C. R. (1973). Linear Statistical Inference and Its Applications (Second Edition). John 
Wiley & Sons. 

[28] Rao, C. R. and Toutenburg, H. (1999). Linear Models: Least Squares and Alternatives 

(Second Edition). Springer, New York. 
[29] Rivas, M.J., Santos, M.T. and Morales, D. (1995). Renyi test statistics for partially 

observed diffusion processes. Journal of Statistical Planning and Inference 127, 91-102. 
[30] Searle, S.R. (1971). Linear Models. John Wiley & Sons, New York. 
[31] Zelterman, D. (1999). Models for Discrete Data. Oxford University Press, New York. 



